大多数凝视估计研究仅适用于相机完美地捕获眼睛凝视的设置条件。他们没有明确指定如何为一个人的给定位置正确设置相机。在本文中,我们对逻辑相机设置位置进行了凝视估计的研究。我们进一步通过使用具有逼真场景的廉价边缘设备在实际应用中进行了研究。也就是说,我们首先建立一个购物环境,在那里我们想要掌握客户凝视行为。该设置需要最佳摄像机位置,以便从现有的凝视估计研究中维持估计精度。然后,我们应用几秒钟学习凝视估计,以减少推理阶段中的训练采样。在实验中,我们对NVIDIA Jetson TX2进行了实施的研究,并达到了合理的速度,12 FPS与我们的参考工作相比更快,而没有缩小估计精度的劣化。源代码在https://github.com/linh-gist/gazeestimationtx2发布。
translated by 谷歌翻译
近年来,与私人数据的分散学习领域有很大进展。联合学习(FL)和分裂学习(SL)是两个拥有其优点和缺点的矛头,并分别适用于许多用户客户和大型型号。为了享受这两个好处,斯普利特这样的混合方法已经出现了迟到,但他们的基本面仍然是虚幻的。在这项工作中,我们首先识别SL的基本瓶颈,从而提出可伸缩的SL框架,被卷曲的SGLR。 SGLR下的服务器在分裂层上广播了平均的公共梯度,在没有横跨客户端的情况下仿真FL而没有任何额外的通信。同时,SGLR将学习率分解为服务器端和客户端速率,并单独调整它们以支持许多客户端。仿真结果证实了SGLR实现比其他基线SL方法更高的精度,包括分裂,这甚至是与耗能更高的能量和通信成本的影响。作为次要结果,我们通过使用SLGR通过基线通过相互信息观察更大的敏感信息泄漏。
translated by 谷歌翻译
研究进步使得在自主车辆中部署的神经网络算法来感知周围。用于感知环境的标准脱墨传感器是摄像机和潮羊段。因此,使用这些脱模传感器开发的神经网络算法已经为自主车辆的感知提供了必要的解决方案。这些脱离传感器的一个主要缺点是它们在恶劣天气条件下的可操作性,例如,低照明和夜间条件。自主车辆传感器套件中热摄像机的可供选择性和可负担能力为自主车辆在恶劣天气条件下的感知方面提供了必要的改进。环境的语义有利于鲁棒的感知,这可以通过在场景中分段不同的对象来实现。在这项工作中,我们使用了用于语义细分的热相机。我们设计了一个名为Artseg的基于关注的反复卷积网络(RCNN)编码器解码器架构,用于热语义分割。这项工作的主要贡献是编码器解码器架构的设计,该架构为每个编码器和解码器块使用RCNN的单位。此外,在解码器模块中采用添加剂注意力,以保持高分辨率特征并改善特征的定位。在可用的公共数据集中评估所提出的方法的功效,显示出与联盟(IOU)的均值交叉口的其他最先进方法更好的性能。
translated by 谷歌翻译
环境的敏感性和敏感性在自主车辆的安全和安全运行中起着决定性作用。这种对周围的感知是类似于人类视觉表示的方式。人类的大脑通过利用不同的感官频道并开发视图不变的表示模型来感知环境。在这种情况下保持,不同的脱模传感器部署在自主车辆上,以感知环境。最常见的遗赠传感器是自主车辆感知的相机,激光乐队和雷达。尽管存在这些传感器,但在可见的光谱结构域中已经在不利的天气条件下说明了它们的益处,例如,在夜间,它们具有有限的操作能力,这可能导致致命事故。在这项工作中,我们探讨了热对象检测,以通过采用自我监督的对比度学习方法来模拟视图不变模型表示。为此,我们提出了一个深度神经网络自我监督的热网络(SSTN),用于学习通过对比学习来最大化可见和红外光谱域之间的信息,并在使用这些学习特征表示使用的使用多尺度编码器 - 解码器互感器网络。在两个公共可用的数据集中广泛评估所提出的方法:FLIR-ADAS数据集和KAIST多光谱数据集。实验结果说明了所提出的方法的功效。
translated by 谷歌翻译
通过验证SOTIF-ISO / PAS-21448(预期功能的安全)来验证安全标准,构思自动车辆以提供安全和安全的服务。在这种情况下,对环境的感知与本地化,规划和控制模块结合起作用乐器作用。作为感知堆栈中的枢轴算法,对象检测提供了广泛的洞察,进入自动车辆的周围环境。相机和激光雷达广泛用于不同的传感器模式之间的物体检测,但这些脱离传感器在分辨率和恶劣天气条件下具有局限性。在这项工作中,探索基于雷达的对象检测提供了部署的对应传感器模块,并用于恶劣天气条件。雷达提供复杂的数据;为此目的,提出了一种具有变压器编码器 - 解码器网络的通道升压功能集合方法。使用雷达的对象检测任务被制定为一个设置的预测问题,并在公共可用的数据集中进行评估,在良好和良好的天气条件下。使用Coco评估度量广泛评估所提出的方法的功效,最佳拟议的模型将其最先进的同行方法超过12.55 \%$ 12.48 \%$ 12.48 \%$。
translated by 谷歌翻译
现代车辆配备各种驾驶员辅助系统,包括自动车道保持,这防止了无意的车道偏离。传统车道检测方法采用了手工制作或基于深度的学习功能,然后使用基于帧的RGB摄像机进行通道提取的后处理技术。用于车道检测任务的帧的RGB摄像机的利用易于照明变化,太阳眩光和运动模糊,这限制了车道检测方法的性能。在自主驾驶中的感知堆栈中结合了一个事件摄像机,用于自动驾驶的感知堆栈是用于减轻基于帧的RGB摄像机遇到的挑战的最有希望的解决方案之一。这项工作的主要贡献是设计车道标记检测模型,它采用动态视觉传感器。本文探讨了使用事件摄像机通过设计卷积编码器后跟注意引导的解码器的新颖性应用了车道标记检测。编码特征的空间分辨率由致密的区域空间金字塔池(ASPP)块保持。解码器中的添加剂注意机制可提高促进车道本地化的高维输入编码特征的性能,并缓解后处理计算。使用DVS数据集进行通道提取(DET)的DVS数据集进行评估所提出的工作的功效。实验结果表明,多人和二进制车道标记检测任务中的5.54 \%$ 5.54 \%$ 5.54 \%$ 5.03 \%$ 5.03 \%$ 5.03。此外,在建议方法的联盟($ iou $)分数上的交叉点将超越最佳最先进的方法,分别以6.50 \%$ 6.50 \%$ 6.5.37 \%$ 9.37 \%$ 。
translated by 谷歌翻译
Understanding the informative structures of scenes is essential for low-level vision tasks. Unfortunately, it is difficult to obtain a concrete visual definition of the informative structures because influences of visual features are task-specific. In this paper, we propose a single general neural network architecture for extracting task-specific structure guidance for scenes. To do this, we first analyze traditional spectral clustering methods, which computes a set of eigenvectors to model a segmented graph forming small compact structures on image domains. We then unfold the traditional graph-partitioning problem into a learnable network, named \textit{Scene Structure Guidance Network (SSGNet)}, to represent the task-specific informative structures. The SSGNet yields a set of coefficients of eigenvectors that produces explicit feature representations of image structures. In addition, our SSGNet is light-weight ($\sim$ 55K parameters), and can be used as a plug-and-play module for off-the-shelf architectures. We optimize the SSGNet without any supervision by proposing two novel training losses that enforce task-specific scene structure generation during training. Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications including joint upsampling and image denoising. We also demonstrate that our SSGNet generalizes well on unseen datasets, compared to existing methods which use structural embedding frameworks. Our source codes are available at https://github.com/jsshin98/SSGNet.
translated by 谷歌翻译
For change detection in remote sensing, constructing a training dataset for deep learning models is difficult due to the requirements of bi-temporal supervision. To overcome this issue, single-temporal supervision which treats change labels as the difference of two semantic masks has been proposed. This novel method trains a change detector using two spatially unrelated images with corresponding semantic labels such as building. However, training on unpaired datasets could confuse the change detector in the case of pixels that are labeled unchanged but are visually significantly different. In order to maintain the visual similarity in unchanged area, in this paper, we emphasize that the change originates from the source image and show that manipulating the source image as an after-image is crucial to the performance of change detection. Extensive experiments demonstrate the importance of maintaining visual information between pre- and post-event images, and our method outperforms existing methods based on single-temporal supervision. code is available at https://github.com/seominseok0429/Self-Pair-for-Change-Detection.
translated by 谷歌翻译
Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By utilizing the learned parameters (statistics) of FP32-pre-trained models, zero-shot quantization schemes focus on generating synthetic data by minimizing the distance between the learned parameters ($\mu$ and $\sigma$) and distributions of intermediate activations. Subsequently, they distill knowledge from the pre-trained model (\textit{teacher}) to the quantized model (\textit{student}) such that the quantized model can be optimized with the synthetic dataset. In general, zero-shot quantization comprises two major elements: synthesizing datasets and quantizing models. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours on even half an hour. Furthermore, we propose a framework called \genie~that generates data suited for post-training quantization. With the data synthesized by \genie, we can produce high-quality quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach.
translated by 谷歌翻译
We study the compute-optimal trade-off between model and training data set sizes for large neural networks. Our result suggests a linear relation similar to that supported by the empirical analysis of Chinchilla. While that work studies transformer-based large language models trained on the MassiveText corpus (gopher), as a starting point for development of a mathematical theory, we focus on a simpler learning model and data generating process, each based on a neural network with a sigmoidal output unit and single hidden layer of ReLU activation units. We establish an upper bound on the minimal information-theoretically achievable expected error as a function of model and data set sizes. We then derive allocations of computation that minimize this bound. We present empirical results which suggest that this approximation correctly identifies an asymptotic linear compute-optimal scaling. This approximation can also generate new insights. Among other things, it suggests that, as the input space dimension or latent space complexity grows, as might be the case for example if a longer history of tokens is taken as input to a language model, a larger fraction of the compute budget should be allocated to growing the learning model rather than training data set.
translated by 谷歌翻译